Exploring the Price of a Diamond based on Features

Scatterplot - Carat Vs Price

GGpairs - Feature Correlation

The Demand of Diamonds

Scatter Plot - Carat Vs Price

Price Vs. Carat & Clarity

## 
##  0.3 0.31 1.01  0.7 0.32    1 
## 2604 2249 2242 1981 1840 1558
## 
## 605 802 625 828 776 698 
## 132 127 126 125 124 121

Price vs Carat and Cut

Price vs Carat and Color

Building the Linear Model for Price

## 
## Calls:
## m1: lm(formula = I(log(price)) ~ I(carat^(1/3)), data = diamonds)
## m2: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat, data = diamonds)
## m3: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat + cut, data = diamonds)
## m4: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat + cut + color, 
##     data = diamonds)
## m5: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat + cut + color + 
##     clarity, data = diamonds)
## 
## =============================================================================
##                       m1          m2          m3         m4          m5      
## -----------------------------------------------------------------------------
##   (Intercept)      2.821***    1.039***    0.874***    0.932***   0.415***   
##                   (0.006)     (0.019)     (0.019)     (0.017)    (0.010)     
##   I(carat^(1/3))   5.558***    8.568***    8.703***    8.438***   9.144***   
##                   (0.007)     (0.032)     (0.031)     (0.028)    (0.016)     
##   carat                       -1.137***   -1.163***   -0.992***  -1.093***   
##                               (0.012)     (0.011)     (0.010)    (0.006)     
##   cut: .L                                  0.224***    0.224***   0.120***   
##                                           (0.004)     (0.004)    (0.002)     
##   cut: .Q                                 -0.062***   -0.062***  -0.031***   
##                                           (0.004)     (0.003)    (0.002)     
##   cut: .C                                  0.051***    0.052***   0.014***   
##                                           (0.003)     (0.003)    (0.002)     
##   cut: ^4                                  0.018***    0.018***  -0.002      
##                                           (0.003)     (0.002)    (0.001)     
##   color: .L                                           -0.373***  -0.441***   
##                                                       (0.003)    (0.002)     
##   color: .Q                                           -0.129***  -0.093***   
##                                                       (0.003)    (0.002)     
##   color: .C                                            0.001     -0.013***   
##                                                       (0.003)    (0.002)     
##   color: ^4                                            0.029***   0.012***   
##                                                       (0.003)    (0.002)     
##   color: ^5                                           -0.016***  -0.003*     
##                                                       (0.003)    (0.001)     
##   color: ^6                                           -0.023***   0.001      
##                                                       (0.002)    (0.001)     
##   clarity: .L                                                     0.907***   
##                                                                  (0.003)     
##   clarity: .Q                                                    -0.240***   
##                                                                  (0.003)     
##   clarity: .C                                                     0.131***   
##                                                                  (0.003)     
##   clarity: ^4                                                    -0.063***   
##                                                                  (0.002)     
##   clarity: ^5                                                     0.026***   
##                                                                  (0.002)     
##   clarity: ^6                                                    -0.002      
##                                                                  (0.002)     
##   clarity: ^7                                                     0.032***   
##                                                                  (0.001)     
## -----------------------------------------------------------------------------
##   R-squared            0.924       0.935       0.939      0.951       0.984  
##   adj. R-squared       0.924       0.935       0.939      0.951       0.984  
##   sigma                0.280       0.259       0.250      0.224       0.129  
##   F               652012.063  387489.366  138654.523  87959.467  173791.084  
##   p                    0.000       0.000       0.000      0.000       0.000  
##   Log-likelihood   -7962.499   -3631.319   -1837.416   4235.240   34091.272  
##   Deviance          4242.831    3613.360    3380.837   2699.212     892.214  
##   AIC              15930.999    7270.637    3690.832  -8442.481  -68140.544  
##   BIC              15957.685    7306.220    3761.997  -8317.942  -67953.736  
##   N                53940       53940       53940      53940       53940      
## =============================================================================
##  [1] "X"            "carat"        "cut"          "color"       
##  [5] "clarity"      "table"        "depth"        "cert"        
##  [9] "measurements" "price"        "x"            "y"           
## [13] "z"
## 
## Calls:
## m1: lm(formula = I(logprice ~ I(carat^(1/3))), data = diamonds_big[diamonds_big$price < 
##     10000 & diamonds_big$cert == "GIA", ])
## m2: lm(formula = logprice ~ I(carat^(1/3)) + carat, data = diamonds_big[diamonds_big$price < 
##     10000 & diamonds_big$cert == "GIA", ])
## m3: lm(formula = logprice ~ I(carat^(1/3)) + carat + cut, data = diamonds_big[diamonds_big$price < 
##     10000 & diamonds_big$cert == "GIA", ])
## m4: lm(formula = logprice ~ I(carat^(1/3)) + carat + cut + color, 
##     data = diamonds_big[diamonds_big$price < 10000 & diamonds_big$cert == 
##         "GIA", ])
## m5: lm(formula = logprice ~ I(carat^(1/3)) + carat + cut + color + 
##     clarity, data = diamonds_big[diamonds_big$price < 10000 & 
##     diamonds_big$cert == "GIA", ])
## 
## =================================================================================
##                       m1           m2           m3          m4          m5       
## ---------------------------------------------------------------------------------
##   (Intercept)       2.671***     1.333***    0.949***    1.341***     0.665***   
##                    (0.003)      (0.012)     (0.012)     (0.010)      (0.007)     
##   I(carat^(1/3))    5.839***     8.243***    8.633***    8.110***     8.320***   
##                    (0.004)      (0.022)     (0.021)     (0.017)      (0.012)     
##   carat                         -1.061***   -1.223***   -0.782***    -0.763***   
##                                 (0.009)     (0.009)     (0.007)      (0.005)     
##   cut: Ideal                                 0.211***    0.181***     0.131***   
##                                             (0.002)     (0.001)      (0.001)     
##   cut: V.Good                                0.120***    0.090***     0.071***   
##                                             (0.002)     (0.001)      (0.001)     
##   color: E/D                                            -0.083***    -0.071***   
##                                                         (0.001)      (0.001)     
##   color: F/D                                            -0.125***    -0.105***   
##                                                         (0.001)      (0.001)     
##   color: G/D                                            -0.178***    -0.162***   
##                                                         (0.001)      (0.001)     
##   color: H/D                                            -0.243***    -0.225***   
##                                                         (0.002)      (0.001)     
##   color: I/D                                            -0.361***    -0.358***   
##                                                         (0.002)      (0.001)     
##   color: J/D                                            -0.500***    -0.509***   
##                                                         (0.002)      (0.001)     
##   color: K/D                                            -0.689***    -0.710***   
##                                                         (0.002)      (0.002)     
##   color: L/D                                            -0.812***    -0.827***   
##                                                         (0.003)      (0.002)     
##   clarity: I2                                                        -0.301***   
##                                                                      (0.006)     
##   clarity: IF                                                         0.751***   
##                                                                      (0.002)     
##   clarity: SI1                                                        0.426***   
##                                                                      (0.002)     
##   clarity: SI2                                                        0.306***   
##                                                                      (0.002)     
##   clarity: VS1                                                        0.590***   
##                                                                      (0.002)     
##   clarity: VS2                                                        0.534***   
##                                                                      (0.002)     
##   clarity: VVS1                                                       0.693***   
##                                                                      (0.002)     
##   clarity: VVS2                                                       0.633***   
##                                                                      (0.002)     
## ---------------------------------------------------------------------------------
##   R-squared             0.888        0.892       0.899       0.937        0.969  
##   adj. R-squared        0.888        0.892       0.899       0.937        0.969  
##   sigma                 0.289        0.284       0.275       0.216        0.154  
##   F               2700903.714  1406538.330  754405.425  423311.488   521161.443  
##   p                     0.000        0.000       0.000       0.000        0.000  
##   Log-likelihood   -60137.791   -53996.269  -43339.818   37830.414   154124.270  
##   Deviance          28298.689    27291.534   25628.285   15874.910     7992.720  
##   AIC              120281.582   108000.539   86691.636  -75632.827  -308204.540  
##   BIC              120313.783   108043.473   86756.037  -75482.557  -307968.400  
##   N                338946       338946      338946      338946       338946      
## =================================================================================

Predictions

##        fit     lwr      upr
## 1 5040.436 3730.34 6810.638